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1.  Introduction 


The  organization  of  movement  in  a  changing  two-dimensional  image  provides  a 
valuable  source  of  information  for  analyzing  the  environment  in  terms  of  objects, 
their  motion  in  space,  and  their  three-dimensional  structure.  It  is  not  surprizing, 
therefore,  that  the  analysis  of  visual  motion  plays  a  central  role  in  biological  vision 
systems.  Sophisticated  mechanisms  for  extracting  and  utilizing  motion  exist  even  in 
simple  animals.  For  example,  the  frog  has  efficient  ’’bug  detection”  mechanisms  that 
respond  selectively  to  small,  dark  objects  moving  in  its  visual  held  [lj.  The  ordinary 
housefly  can  track  moving  objects  and  discover  the  relative  motion  between  a  target 
and  its  background,  even  when  the  two  are  identical  in  texture,  and  therefore 
indistinguishable  in  the  absence  of  relative  motion  [2j. 

In  higher  animals,  including  primates,  the  analysis  of  motion  is  ’’wired  into” 
the  visual  system  from  the  earliest  processing  stages.  Some  species,  such  as  the 
pigeon  [3]  and  rabbit  [l]  (see  [5]  for  other  examples)  perform  rudimentary  motion 
analysis  at  the  retinal  level.  In  other  animals,  including  cats  and  primates,  the  first 
neurons  in  visual  cortex  to  receive  input  from  the  eyes  are  already  involved  in  the 
analysis  of  motion:  they  respond  well  to  stimuli  moving  in  one  direction,  but  little, 
or  not  at  all,  to  motion  in  the  opposite  direction  [6,7]. 

In  some  animals,  visual  motion  is  used  in  the  guidance  of  locomotion  and 
the  control  of  body  motion.  The  plummeting  gannet  [8],  for  example,  uses  visual 
flow  information  to  stretch  back  its  wings  a  fraction  of  a  second  before  it  hits 
the  water.  Perhaps  the  most  remarkable  use  of  visual  motion  is  the  recovery 
of  three-dimensional  shape  using  motion  information  alone.  This  capacity  of  the 
human  visual  system  has  been  demonstrated  in  the  studies  of  Wallach  and  O’Connell 
[9]  and  Johansson  [10,11]. 

The  extensive  use  of  motion  by  biological  systems,  and  in  particular  the  human 
visual  system,  demonstrates  the  feasibility  of  carrying  out  certain  information 
processing  tasks  and  helps  to  establish  specific  goals  for  the  analysis  of  time- varying 
imagery.  This  analysis  divides  naturally  into  two  parts.  The  first  stage  is  the 
measurement  of  motion;  for  example,  the  assignment  of  direction  and  magnitude 
of  velocity  to  elements  in  the  image,  on  the  basis  of  the  changing  intensity  pattern. 
The  second  is  the  use  of  motion  measurements;  for  example,  to  separate  the  scene 
into  distinct  objects,  and  infer  their  three-dimensional  structure. 

In  this  paper,  we  present  a  computational  study  of  the  measurement  of  visual 
motion.  It  is  a  problem  which  was  found  to  be  surprizingly  difficult,  both  in 
computer  vision,  and  in  modelling  biological  vision  systems.  We  will  present  the 
general  problem  of  motion  measurement  in  Section  2,  and  discuss  methods  that 
have  been  proposed  for  its  solution.  Section  3  presents  a  specific  scheme,  proposed 
by  Marr  and  Ullman  [12],  for  extracting  the  first  motion  measurements  from  the 
changing  image.  The  initial  measurements  do  not  yet  specify  the  true  motion  of 
objects  in  the  changing  image,  and  must  be  combined  in  some  way.  This  raises  the 
motion  integration  problem,  which  will  be  discussed  in  Sections  3  and  4.  Section 
5  presents  some  implications  for  the  analysis  of  motion  in  biological  vision  systems. 


2.  Motion  Detection  and  Measuremeit 

The  motion  of  elements  and  regions  in  an  image  is  not  given  directly,  but  must 
be  computed  from  more  elementary  measurements.  The  initial  registration  of  light 
by  the  eye  or  by  electronic  imaging  devices  can  be  described  as  producing  a 


two-dimensional  array  of  time-dependent  light  intensity  values,  l[x,y,t).  Motion  in 
the  image  can  be  described  in  terms  of  a  vector  field  V(x,  y,  t)  giving  the  velocity 
of  a  point  with  image  coordinates  (x,  y)  at  time  t.  The  first  problem  in  analyzing 
visual  motion  is  the  computation  of  V(x,y,t)  from  I(x,y,t).  This  computation  is 
the  measurement  of  visual  motion. 

In  some  cases  it  may  be  sufficient  to  detect  only  certain  properties  of  the 
velocity  field,  rather  than  measure  it  completely  and  precisely.  For  example,  in  order 
to  respond  quickly  to  a  moving  object,  motion  must  be  detected,  but  not  necessarily 
measured.  Other  tasks,  such  as  the  recovery  of  three-dimensional  structure  from 
motion,  require  a  more  complete  and  accurate  measurement  of  the  velocity  field 
[13-17], 

The  measurement  of  motion  may  be  performed  at  different  stages  in  the 
processing  of  an  image,  utilizing  different  motion  primitives.  It  is  useful  to  draw  a 
distinction  between  two  main  schemes.  At  the  lowest  level,  motion  measurements 
may  be  based  directly  on  the  local  changes  in  light  intensity  values;  these  are 
called  intensity-based,  schemes.  Alternatively,  it  is  possible  to  first  identify  features 
such  as  edges  and  their  termination  points,  corners,  blobs,  or  regions,  and  then 
measure  motion  by  matching  these  features  over  time,  and  detecting  their  changing 
positions.  Schemes  of  this  type  are  called  token-matching  schemes.  These  two 
modes  of  motion  detection  and  measurement  give  rise  to  different  computational 
problems,  and  consequently  to  different  kinds  of  processes  in  biological  as  well  as 
computer  vision  systems. 

2.1.  Intensity-based  Schemes  for  Motion  Measurement 

Two  main  types  of  intensity-based  schemes  have  been  advanced  for  biological 
and  computer  vision  systems:  correlation  techniques  and  gradient  methods.  Cross¬ 
correlation  of  raw  intensity  values  has  been  used  in  computer  vision  applications 
[18-21],  and  has  been  proposed  as  a  model  for  motion  measurement  in  the  human 
visual  system  [22-24].  Related  to  cross-correlation  schemes  are  subtraction  schemes, 
involving  simple  differencing  operations  between  successive  frames.  In  computer 
vision,  such  schemes  are  primarily  used  for  the  detection  of  motion,  and  object 
segmentation  [25-28];  together  with  cross-correlation,  they  have  been  utilized  for 
the  measurement  ol  motion  [26,28].  A  fundamental  problem  of  most  correlation 
and  subtraction  schemes  is  that  they  assume  the  image  (or  a  large  portion  of  it) 
moves  as  a  whole  between  the  two  frames.  Images  containing  independently  moving 
objects  and  image  distortions  induced  by  the  unrestricted  motion  of  objects  in 
space  pose  difficult,  perhaps  insurmountable,  problems  for  these  techniques. 

Other  intensity-based  schemes  have  been  proposed  for  biological  systems.  A 
simple  motion  detector  can  be  constructed  by  comparing  the  outputs  of  two 
detectors  to  light  increments  at  two  adjacent  positions.  The  output  at  position  pi 
and  time  t  is  compared  with  that  at  p2  at  time  t  —  St  (a  low-pass  temporal  filter 
may  be  used  instead  of  the  delay  [29]).  Two  variations  of  this  approach,  called  the 
delayed  comparison  scheme,  have  been  proposed  as  models  for  biological  systems. 
The  first  is  obtained  by  multiplying  the  two  values,  i.e.  D(pi,  t)  •  D(p2,t  —  St),  where 
D  denotes  the  output  of  the  subunits,  shown  in  Figure  la.  If  a  point  of  light  moves 
from  pa  to  pi  in  time  equal  to  St,  this  product  will  be  positive.  In  an  array  of  such 
detectors,  the  average  output  is  essentially  equivalent  to  a  cross-correlation  of  the 
inputs  [29],  An  alternative  method  along  the  same  general  line  is  the  ”And-Not” 
scheme  proposed  by  Barlow  and  Levick  [30]  for  the  directionally  selective  units  in 
the  rabbit’s  retina  (a  similar  scheme  was  suggested  for  the  cat’s  visual  cortex  [31]). 


(a)  (b) 

Figure  1.  The  delayed  comparison  schemes,  (a)  The  two  inputs  are  multiplied 

(b)  The  ”And-Not"  scheme 

Evidence  for  inhibitory  interactions  within  the  directionally  selective  mechanism 
led  to  a  mode!  in  which  the  motion  detector  computes  the  logical  ’’And”  of  D{p\,t) 
and  ’’Not”  of  D{P2,  t  —  6t)  (see  Figure  lb).  In  this  scheme,  a  motion  from  P2  to  p\ 
is  ’’vetoed”  by  a  delayed  response  from  P2,  whereas  motion  from  p\  to  P2  produces 
a  positive  response.  Poggio  and  Reichardt.  [32)  have  proposed  a  similar  scheme  for 
the  visual  system  of  the  fly ,  and  an  elegant  synaptic  mechanism  that  implements 
these  computations  was  described  by  Torre  and  Poggio  [33]. 

Some  general  properties  of  the  delayed  comparison  schemes  are  worth  noting. 
First,  these  detectors  respond  selectively  not  only  to  continuous  motion,  but  also 
to  discrete  jumps  of  the  stimulus  between  positions  pi  and  p 2.  Second,  the  speed  of 
motion  must  lie  within  a  certain  range,  determined  by  the  delay  (or  the  low-pass 
filtering)  and  the  separation  between  the  receptors.  A  range  of  velocities  can  be 
covered  by  a  family  of  detectors  with  different  internal  delays  and  interreceptor 
spacing.  Finally,  motion  measurements  cannot  be  determined  reliably  from  the 
output  of  a  single  detector  of  this  type.  The  accurate  and  reliable  measurement  of 
motion  will  require  the  combination  of  the  outputs  from  an  array  of  such  elementary 
detectors. 

In  gradient  schemes,  the  local  motion  measurements  are  derived  via  a  comparison 
between  intensity  gradients,  and  temporal  intensity  changes.  A  one-dimensional 
example,  illustrating  the  basic  principle,  is  shown  in  Figure  2.  Consider  the  intensity 
profile  (intensity  /  as  a  function  of  position  1),  indicated  by  the  solid  curve  in  Figure 
2.  At  the  point  p,  the  profile  has  a  positive  slope.  If  the  profile  moves  to  the  left, 
indicated  by  the  dashed  curve,  the  intensity  value  /  at  p  will  be  increasing;  for  a 
rightward  motion,  indicated  by  the  dotted  and  dashed  curve,  l[p)  will  be  decreasing. 
The  sign  of  the  temporal  change  in  /(p)  thus  signals  the  direction  of  motion,  and 
from  the  magnitude  of  the  spatial  and  temporal  intensity  changes,  the  speed  of 
motion  can  be  determined.  In  principle,  measurements  of  motion  may  be  obtained 
wherever  the  image  intensity  gradient  is  non- zero;  however,  the  measurements  are 
more  reliable  at  the  location  of  edges,  where  steep  intensity  gradients  are  induced. 
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Figure  2.  Comparison  of  the  sign  of  the  spatial  and  temporal  derivatives  of  intensity 
at  the  point  p  yields  the  sign  of  direction  of  motion 

In  two  dimensions,  the  spatial  and  temporal  intensity  changes  alone  are  not 
sufficient  to  determine  the  local  direction  and  magnitude  of  velocity  [12,  34-37], 
because  of  the  aperture  problem,  illustrated  in  Figure  3.  If  the  motion  of  the 
edge  E  is  to  be  detected  by  operations  which  examine  an  area  A  that  is  small 
compared  to  the  overall  extent  of  the  edge,  the  only  motion  that  can  be  extracted 
is  the  component  c  perpendicular  to  the  local  orientation  of  the  edge.  For  example, 
such  operations  cannot  distinguish  between  motion  in  the  directions  b,  c,  and  d. 
To  determine  the  motion  completely,  a  second  stage  of  analysis  is  required,  which 
integrates  the  local  motion  measurements,  either  over  an  area  of  the  image,  or 
along  contours. 


Figure  3.  The  aperture  problem.  Motion  in  the  directions  b,  c  and  d  can  not  be 
distinguished  when  viewed  through  the  local  aperture  A. 

2.2.  Token-matching  Schemes  for  Motion  Measurement 

In  token-matching  schemes,  identifiable  elements  -  tokens  -  are  located  and  then 
matched  over  time.  Assuming  that  the  visual  input  is  given  as  a  sequence  of 
discrete  frames,  a  counterpart  for  each  element  in  one  frame  must  be  located  in 
the  next.  This  raises  the  correspondence  problem,  illustrated  in  Figure  4.  The 
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filled  circles  in  the  figure  represent  the  first  frame,  and  the  open  circles  the  second. 
There  are  two  possible  one-to-one  pairings  between  the  elements  of  the  two  frames, 
leading  to  two  patterns  of  perceived  motion:  diagonal  (a)  or  horizontal  (b).  In 
this  example,  ihe  match  is  only  two-way  ambiguous.  In  general,  each  frame  could 
contain  many  elements  arranged  in  complex  figures;  a  correspondence  must  then 
be  established  among  them.  The  rules  governing  the  correspondence  process  in 
human  vision  have  been  investigated  [38-44],  but  are  still  far  from  being  completely 
understood.  Token-matching  schemes  for  motion  measurement  have  also  been 
studied  for  computer  vision  [45-50]. 


(a)  (b) 

Figure  4.  A  simple  correspondence  problem 

Two  general  problems  of  token- matching  schemes  are  relevant  to  both 
biological  and  machine  motion  analysis.  The  first  concerns  the  level  at  which 
the  correspondence  is  established.  By  this  we  mean  the  degree  of  preprocessing 
and  the  complexity  of  the  participating  tokens.  Matching  may  be  established 
between  simple  tokens  such  as  points,  blobs,  and  edge  fragments.  Alternatively, 
the  matching  process  may  operate  on  complex  tokens  such  as  structured  forms, 
or  even  the  images  of  recognized  objects.  The  use  of  complex  tokens  can  simplify 
the  correspondence  process,  since  a  complex  token  will  usually  have  a  unique 
counterpart  in  a  subsequent  frame.  Primitive  tokens  will  usually  have  many 
competing  possible  matches,  but  their  use  has  two  distinct  advantages.  The  first 
is  a  reduced  preprocessing  requirement,  which  is  of  special  importance  in  motion 
perception,  where  computation  time  is  severely  restricted.  The  second  is  that  a 
correspondence  scheme  based  on  primitive  tokens  can  operate  on  arbitrary  objects 
engaged  in  complex  shape  changes.  It  seems,  therefore,  that  the  correspondence 
process  should  operate  on  the  level  of  rather  primitive  elements,  perhaps  at  the 
level  of  Marr’s  full  primal  sketch  [51,52]. 

The  second  general  problem  concerns  the  possible  role  of  intensity-based  and 
token-matching  schemes  in  an  integrated  vision  system.  Intensity-based  schemes 
tend  to  be  fast  and  sensitive,  but  the  ambiguity  of  the  local  measurements  may 
make  it  difficult  to  recover  the  velocity  field  accurately.  A  token-matching  scheme 
can,  in  principle,  track  a  sharply  localized  token  (such  as  a  line  termination)  over 
long  distances,  and  thereby  achieve  a  high  degree  of  accuracy,  at  the  price  of 
more  extensive  processing,  in  locating  the  tokens  and  solving  the  correspondence 
problem. 

In  light  of  the  differences  in  their  basic  properties,  it  is  possible  that  the  two 
motion  measurement  schemes  serve  distinct  visual  tasks.  The  intensity-based  system 
may  be  useful  as  an  ’’early  warning”  system,  and  for  the  separation  of  moving 
objects  from  their  background.  Token-matching  schemes  may  play  an  important 
role  in  the  recovery  of  structure  from  motion,  where  the  accurate  tracking  over 
considerable  distances  is  useful.  A  second  possiblility  is  that  the  two  schemes 
interact  to  complement  each  other.  For  example,  a  token-matching  scheme  might 
be  guided  by  additional  constraints  supplied  by  an  intensity- based  system. 


2.3.  Two  Motiou  Systems  in  Human  Vision 

Psychological  studies  of  motion  detection  and  measurement  in  the  human  system 
have  distinguished  two  types  of  visual  motion:  discrete  and  continuous.  For  human 
observers  to  perceive  motion,  the  stimulus  need  not  move  continuously  across  the 
visual  field.  Under  the  appropriate  spatial  and  temporal  presentation  parameters,  a 
stimulus  presented  sequentially  can  produce  the  impression  of  smooth,  uninterrupted 
motion  (as  in  motion  pictures)  (53) .  The  visual  system  can  ’’fill-in”  the  gaps  in  the 
discrete  presentation  even  when  the  stimuli  are  separated  by  up  to  several  degrees 
of  visual  angle,  and  by  long  temporal  intervals  (400  msec.,  [54]).  The  resulting 
motion,  termed  ’’apparent”  or  ’’beta”  motion  is  perceptually  indistinguishable  from 
continuous  motion. 

The  apparent  motion  phenomena  raise  the  question  of  whether  discrete  and 
continuous  motion  are  registered  by  two  different  mechanisms.  The  fact  that  the 
visual  system  can  register  both  types  of  motion  does  not  imply  the  existence  of 
two  separate  mechanisms,  since  a  system  that  registers  discrete  motion  could  in 
principle  register  continuous  motion  as  well.  Psychophysical  evidence  supports, 
however,  the  view  that  two  different  mechanisms  are  in  fact  involved  in  the  process 
of  motion  detection  and  measurement  [55-60].  The  terms  ’’short  range”  and  ’’long 
range”  processes  were  suggested  by  Braddick  [58]  for  the  two  mechanisms.  The 
short  range  mechanism  registers  continuous  motion,  or  motion  presented  discretely, 
with  displacements  of  up  to  about  15  min.  of  arc  (in  the  center  of  the  visual 
field)  and  temporal  intervals  of  less  than  about  60-100  msec.  The  long  range 
mechanism  can  process  larger  displacements  and  temporal  intervals.  Braddick’s 
terminology  characterizes  the  distinction  between  the  two  mechanisms  better  than 
the  discrete/continuous  dichotomy,  since,  discrete  presentation  with  jumps  of  up  to 
15  min.  of  visual  arc  will  be  processed  by  the  short  range  mechanism. 

In  the  human  visual  system,  it  appears  that  the  short  range  process  is  an 
intensity-based  scheme,  whereas  the  long  range  process  is  a  token-matching  scheme. 
Braddick  [58]  proposed  that  the  directionally-seicctive  units  of  visual  cortex  underly 
the  short  range  process,  suggesting  that,  the  spatial  and  temporal  limits  reflect 
the  spatial  and  temporal  parameters  of  these  neural  units.  Marr  and  Ullman 
[12],  present  a  gradient  scheme  for  the  detection  and  measurement  of  motion, 
which  includes  a  model  for  constructing  the  directionally-selective  units,  and  an 
algorithm  for  combining  the  local  measurements  to  compute  the  two-dimensional 
velocity  field.  The  long  range  motion  phenomena  illustrate  our  ability  to  derive  a 
correspondence  of  elements  in  the  changing  image,  over  considerable  distances  and 
temporal  intervals.  In  these  situations,  there  is  no  continuous  motion  of  elements 
across  the  retina  to  be  measured  directly.  Psychophysical  studies  have  shown  the 
long  range  correspondence  to  be  based  on  more  symbolic  primitives,  such  as  edges, 
bars,  blobs,  simple  groups  of  primitive  elements,  and  texture  edges  [13,61]. 

2.4.  Summary 

To  summarize,  several  methods  are  available  for  the  detection  and  measurement 
of  motion.  These  methods  differ  in  the  constraints  they  derive  from  the  changing 
image.  Intensity-based  schemes  utilize  the  spatial  and  temporal  changes  in  the  image 
intensity  pattern  to  constrain  local  velocity,  while  token-matching  schemes  extract 
more  symbolic  tokens  from  the  image,  which  are  then  matched  over  time.  These 
two  techniques  for  motion  analysis  give  rise  to  different  computational  problems, 
and  consequently  to  different  kinds  of  processes  in  biological  and  computer  vision 
systems. 
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3.  Deriving  Velocity  Constraints  from  the  Image 


In  this  section,  wo  first  present  a  scheme  for  extractin':  initial  motion  constraints 
from  the  image,  proposed  by  Marr  and  (’liman  12'.  which  was  motivated  by 
computational  studies  of  early  visual  processing,  and  neurophysiological  studies 
of  direct  ionally-seleet tve  simple  ceils  in  primate  visual  cortex.  The  use  of  this 
type  of  initial  motion  measurement  raises  the  motion  i  nt  <  grat  ton  problem :  the 
measurements  do  not  yet  specify  the  true  motion  of  objects  in  the  changing  image, 
and  must  be  integrated  m  some  way  to  compute  the  velocity  field.  Computational 
studies  suggested  that  the  first  stage  of  image  analysis  should  be  the  detection 
of  intensity  changes  (see  |62j  for  a  review).  Marr  and  Hildreth  !(>  V,  have  proposed 
that  an  optimal  operator  for  the  initial  lilt  'Ting  of  the  linage  is  the  I, apiarian 
ol  a  Causstan,  V~Yi whose  shape  may  bi  approximated  by  the  difference  of 
two  Ciausstans.  the  elements  m  this  couvohr ion  output,  which  eertopond  to  the 
location  of  intensity  changes,  are  tin  zero-crossings  '61b  Figure  a  shows  art  image 
which  has  been  processed  through  a  V~(>  filter,  and  the  resulting  zero-rrossing 
contours.  Marr  and  llildre'h  suggested  that  the  convolution  of  the  image  with  V‘G’ 
is  represented  in  the  output  of  the  ntinal  ganglion  X cells,  and  that  a  class  of 
simple  cells  in  visual  cortex  assumes  the  role  of  /.<  ro- crossing  detection. 


(a) 


(b) 


(c) 


Figure  5.  The  detection  of  intensity  changes  (a)  The  original  image  (b)  The  convolution 
of  (a)  with  a  V"’(7  operator  (c)  The  resulting  zero-crossing  contours. 

Marr  and  Ullman  [12]  have  extended  this  model  Tor  simple  cells,  including  a 
mechanism  for  their  directional  selectivity.  The  basic  idea  is  illustrated  in  Figure 
6.  Figure  6a  shows  the  one-dimensional  output  of  the  convolution  of  a  step-edge 
intensity  profile,  with  the  second  derivative  of  a  gaussian,  ( D2G*I ).  Figure  6b  and 
Figure  6c  illustrate  the  time  derivative,  I),  for  motion  of  the  profile  to 

the  left  and  right,  respectively.  At,  the  location  of  the  zero-crossing  7 ,  the  time 
derivative  will  be  negative  for  motion  to  the  left,  and  positive  for  motion  to  the 
right.  Similar  to  the  gradient  scheme  introduced  in  Section  2,  the  sign  of  contrast 
of  the  zero-crossing  can  be  compared  with  the  sign  of  the  temporal  derivative,  to 
compute  the  direction  of  motion  of  the  zero-crossing.  By  combining  the  magnitude 
of  the  slope  of  the  convolution  output  as  it  crosses  zero,  with  the  magnitude  of  the 
time  derivative,  rough  magnitude  of  velocity  can  be  computed.  In  two  dimensions, 
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comparison  of  the  spatial  and  temporal  derivatives  of  V2G*/  (where  /  is  now  a 
two-dimensional  intensity  distribution)  at  the  location  of  zero-crossings,  provides 
only  the  component  of  motion  in  the  direction  perpendicular  to  the  local  orientation 
of  the  contour. 

Marr  and  Ullman  have  proposed  that  the  retinal  ganglion  Y-cells  carry  the 
time  derivative  of  the  V2G  convolution,  and  that  simple  cells  combine  the  spatial 
and  temporal  derivatives,  carried  by  the  X-system  and  Y-system  (via  the  LGN), 
to  compute  the  direction  of  motion  of  the  zero-crossing  contours.  A  neural  model 
for  the  derivation  of  the  spatial  and  temporal  derivatives  has  been  proposed  by 
Richter  and  Ullman  [65).  Recent  neurophysiological  studies  support  the  role  of 
simple  cells  in  the  detection  of  zero-crossings  [Richter,  personal  communication).  In 
addition  to  neurophysiological  support,  this  scheme  appears  to  be  consistent  with 
psychophysical  studies  of  the  short-range  process  [12). 


(a) 


(b) 


(c) 


Figure  6.  The  Marr- Ullman  scheme,  (a)  Convolution  of  a  step  intensity  change  with 
D~G  (b)  and  (c)  Temporal  intensity  derivative  for  motion  of  the  profile  to  the  left  and 
right 

From  a  computational  standpoint,  restricting  the  measurement  of  motion  to 
the  location  of  zero-crossings  has  two  advantages  over  schemes  based  only  on  the  raw 
intensities.  First,  the  zero-crossings  of  VZG*/  correspond  to  points  in  the  image  at 
which  the  gradient  of  intensity  is  locally  maximum,  yielding  the  most  reliable  local 
velocit  y  measurements.  Second,  the  zero-crossings  are  tied  more  closely  to  physical 
features;  if  the  zero-crossings  move,  it  is  more  likely  to  be  the  consequence  of 


movement  of  an  underlying  physical  surface.  There  are  many  factors  that  can  cause 
intensity  to  change  locally,  such  as  changing  illumination;  a  change  in  intensity 
over  time  is  not  necessarily  due  to  the  motion  of  an  underlying  surface. 

The  zero-crossing  scheme  presented  above  does  not  yet  solve  the  motion 
measurement  problem.  The  measurement  of  the  motion  of  zero-crossings,  using 
a  local  gradient  scheme,  provides  only  the  component  of  motion  in  the  direction 
perpendicular  to  the  orientation  of  the  contour.  The  component  of  velocity  along  the 
contour  remains  undetected.  More  formally,  we  may  express  the  velocity  field  along 
a  contour  by  the  function  Vr(s),  where  s  denotes  arclerigth.  V(.s)  can  be  decomposed 
into  components  tangent  and  perpendicular  to  the  contour,  as  illustrated  in  Figure 
7.  (s)  and  u-l-(s)  are  unit  vectors  in  the  directions  tangent  and  perpendicular  to 

the  curve,  and  v^(s)  and  im-(s)  denote  the  magnitudes  of  the  two  components: 

V(s)  =  v  1  (s)uT(s)  -}-  v  l-(s)u-l-(s)  (1) 


Figure  7.  The  decomposition  of  velocity  V(s)  into  tangential  and  perpendicular 
components 

The  component  u-h(s)  is  given  directly  by  the  initial  measurements  from  the 
changing  image;  the  computation  of  V(s)  requires  the  further  recovery  of  r^fs). 

At  the  very  least,  the  computation  of  V(s)  requires  the  integration  of  the 
constraints  provided  by  u-L(s)  along  the  contour.  In  general  however,  the  solution 
may  still  be  underdetermined.  Additional  constraint  is  required  to  compute  a  single 
velocity  field.  Figure  8  illustrates  two  examples,  in  which  the  velocity  field  solution 
is  not  unique.  In  Figure  8a,  the  solid  and  dotted  lines  represent  the  image  of  a 
moving  circle,  at  different  instants  of  time.  In  the  first  frame  (solid  line),  the  circle 
lies  parallel  to  the  image  plane,  while  in  the  second  frame,  the  circle  is  slanted  in 
depth.  One  velocity  field  consistent  with  this  sequence  is  derived  from  pure  rotation 
of  the  circle  about  the  central  vertical  axis,  as  shown  to  the  left  in  Figure  8a. 
(The  arrows  represent  local  velocities.)  However,  there  could  also  be  a  component 
of  rotation  in  the  plane  of  the  circle,  about  its  center;  as  shewn  to  the  right  in 
Figure  8a.  Both  velocity  fields  correspond  to  valid  rigid  motions  of  the  circle.  This 
ambiguity  is  not  particular  to  circles.  In  Figure  8b,  the  solid  curve  C\  rotates, 
translates  and  deforms  over  time,  to  yield  the  dotted  curve  C2.  The  mapping  of 
points  from  C\  to  C2  is  much  less  clear  (consider,  for  example,  different  possible 
velocities  for  the  point  p).  The  precise  computation  of  the  velocity  field  in  this  case 
is  important,  when  one  considers  the  subsequent  computation  of  structure  from 
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Figure  8.  Ambiguity  of  the  velocity  field  computation,  (a)  A  circle,  rotating  in  depth 
(b)  A  deforming  curve 


motion;  different  choices  for  the  velocity  field  may  yield  different  three-dimensional 
structures.  The  computation  of  a  unique  velocity  field  requires  additional  assump¬ 
tions  about  physical  surfaces,  and  the  velocity  field  that  they  generate  under 
motion. 


In  conclusion,  the  computation  of  the  velocity  field,  for  the  case  of  general 
motion,  requires  a  scheme  that  combines  local  measurements  of  motion  from  the 
changing  image,  subject  to  additional  constraints.  This  is  the  motion  integration 
problem. 


4.  The  Integration  of  Local  Motion  Measuremeits 


In  this  section,  we  discuss  the  motion  integration  problem,  strictly  from  a 
computational  viewpoint.  The  results  that  we  present  here  are  largely  independent 
of  the  nature  of  the  initial  motion  measurements,  and  in  particular,  do  not  depend 
on  the  Marr-Ullman  scheme  discussed  previously.  This  section  will  be  organized  by 
the  type  of  additional  constraint  that  may  be  utilized  in  the  combination  stage.  We 
will  consider  four  types  of  additional  constraint  on  the  velocity  field:  (1)  velocity 
is  constant  over  an  area  of  the  image  (valid  for  pure  translation);  (2)  the  velocity 
field  is  consistent  with  rigid  rotation  and  translation  of  objects  in  the  image  plane; 
(3)  the  velocity  field  is  smooth,  and  exhibits  the  least  variation  among  the  set 
of  velocity  fields  consistent  with  the  image  constraints;  and  (4)  the  velocity  field 
is  smooth,  exhibits  the  least  variation  possible,  and  is  constant  over  small  time 
intervals.  We  will  discuss  methods  for  combining  local  measurements,  given  each  of 
the  four  types  of  constraint. 

4.1.  The  Constant  Velocity  Constraint 

Much  of  the  previous  work  in  motion  analysis  has  addressed  the  case  of  pure 
translation  of  objects  in  the  image  plane.  The  early  gradient  schemes  used  in 
computer  vision  [34,66]  assumed  that  velocity  would  be  constant  over  a  large 
area  of  the  image.  Most  correlation  and  subtraction  schemes  also  embodied  this 
assumption.  Marr  and  Ullman  [12,67]  proposed  a  scheme  in  which  each  local 
measurement  restricts  the  true  velocity  of  a  patch  to  lie  within  a  180°  range  of 
directions  to  one  side  of  a  segment  of  the  local  zero-crossing  contour.  A  set  of 
measurements  taken  at  different  orientations  along  a  zero-crossing  contour  then 
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further  restrict  the  allowable  velocity  directions,  until  a  single  velocity  direction  is 
obtained,  which  is  consistent  with  all  the  local  measurements. 

The  scheme  that  we  present  in  the  next  section,  for  analyzing  rotation  and 
translation  in  the  image  plane,  may  also  be  used  for  the  restricted  case  of  pure 
translation.  While  these  schemes  cannot  account  for  tV  full  range  of  human  motion 
perception,  they  may  be  useful  for  the  initial  detect' 'jn  and  rough  measurement 
of  motion  in  the  periphery,  or  analysis  of  motion  during  smooth  pursuit  eye 
movements,  in  which  stationary  objects  translate  rigidly  with  respect  to  the  eye. 
Iri  computer  vision,  there  are  restricted  applications  for  these  techniques,  such  as 
the  tracking  of  objects  along  a  conveyor,  or  computation  of  camera  motion  [68]. 

4.2.  Rigid  Motion  in  the  image  Plane 

In  this  and  remaining  sections,  we  will  focus  on  the  motion  of  contours.  The  results 
apply,  however,  to  continuous  patches  in  the  image  as  well.  First,  suppose  we  have 
a  rigid  curve  undergoing  general  motion  in  space.  Its  instantaneous  motion  may 
be  described  as  the  combination  of:  (1)  a  rotation  with  angular  velocity  u)  about  a 

single  axis  in  space,  which  we  will  denote  by  the  vector  n  ==  [r?i ,  n^,  n,-j]  ( T  denotes 
the  transpose  of  a  vector),  and  (2)  a  translation,  which  we  will  denote  by  the  vector 

d  =  \d\,d2,dz\  .  Let  the  curve  be  given  parametrically  by  C  =  (i(s),  y(s),  z(s)). 
The  location  of  a  point  on  the  curve  may  be  given  by  the  position  vector  r  = 
[z(s),  y(s),  x(s)]  .  If  we  let  the  optical  axis  lie  along  the  z-axis,  and  let  the  projection 
of  the  curve  onto  the  image  plane  (the  (x,y)  plane)  be  orthographic,  then  the 
two-dimensional  velocity  field  V(s)  along  the  contour  is  given  by: 


V(s)  —  M{ r  X  un  -j-  d)  =  wz(s) 


*2 
— nj 


+  wn3 


-y(«) 
*(*)  . 


+ 


dx 

d2 


(2) 


M  denotes  the  matrix  which  performs  the  orthographic  projection.  The  first  term  in 
the  resulting  expression  describes  the  component  of  the  velocity  field  due  to  rotation 

in  depth  about  an  axis  parallel  to  the  image  plane  (the  axis  n  =  [nj, ri2,0]r);  the 
second  term  is  the  component  due  to  motion  in  the  image  plane  (rotation  about 

the  axis  n  =  [0,0, 713]  ),  and  the  third  term  is  the  translation  component. 

Consider  the  restricted  case  of  rigid  motion  in  the  image  plane;  the  velocity 
field  now  corresponds  to  the  combination  of  a  translation,  and  rotation  about  the 

rp 

axis  n  =  [0, 0, 1]  .  Thus,  V(s)  is  given  by: 


V(»)  = 


u > 


-y{s) 

x{s) 


d- 


d\ 

d2 


(3) 


V(s)  is  simply  a  translation,  rotation  and  scaling  of  the  image  curve  (z(a),y(a)), 
as  illustrated  in  Figure  9.  In  Figure  9a,  the  curve  C\  undergoes  a  small  rotation 
and  translation  in  the  image  plane  to  yield  the  curve  C2.  The  arrows  indicate  local 
velocity  vectors  along  the  curve.  In  Figure  9b,  these  velocity  vectors  have  been 
translated  to  a  common  origin  in  velocity  space,  where  the  x  and  y  axes  represent 
the  x  and  y  components  of  velocity.  The  curve  in  velocity  space  has  the  same  shape 
as  the  image  curve  Cj;  its  size  is  proportional  to  angular  velocity  u,  and  it  is  rotated 
90°  with  respect  to  Ci  (this  relationship  is  also  used  in  kinematics  [69]). 
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Figure  9.  Rigid  motion  in  the  image  plane,  (a)  The  velocity  field  in  the  image  (b)  The 
velocity  vectors  in  velocity  space 

The  additional  translation  of  the  curve  in  the  image  yields  the  same  translation  of 
the  curve  in  velocity  space.  In  the  case  of  pure  translation,  the  image  of  the  velocity 
field  in  velocity  space  degenerates  to  a  single  vector.  In  general,  the  explicit  use  of 
the  velocity  space  aids  in  the  visualization  of  properties  of  the  motion  of  curves, 
and  provides  a  tool  for  establishing  theoretical  properties  of  the  velocity  field. 

For  the  simple  case  of  rigid  motion  in  the  image  plane,  this  relationship  between 
the  shape  of  the  curve  and  the  velocity  field  is  not  restricted  to  continuous  motion 
of  the  curve.  For  discrete  motion  of  a  curve,  we  will  use  the  term  displacement 
field  for  the  set  of  vectors  which  describe  the  discrete  displacement  of  points  on  the 
curve.  If  we  let  a  be  a  discrete  angular  rotation  of  the  curve  in  the  image  plane, 
then  the  displacement  field  V^(s)  is  given  by: 


V  fs!  =  [C08<T  —  1  sin  <7  _l 

’  —sin  a  cos  a—  lj  y(s)  dj 


\'d{s)  is  also  given  by  a  scaling  and  rotation  of  the  projected  image  curve 
(x(s),v(s))-  In  this  case,  the  scale  factor  k  is  given  by: 

k  =  \/ (cos  o  —  l)2  -)-  sin2  a  —  J2{\  —  cos  a)  (5) 


The  angle  of  rotation  a  between  the  image  curve  (x(s),y(s))  and  the 
corresponding  curve  in  velocity  space,  is  given  by: 


For  small  a,  k  o  and  a  «  ^90°.  As  before,  an  additional  translation 
component  simply  translates  the  curve  in  velocity  space. 

For  rigid  motion  in  the  image  plane,  a  simple  scheme  can  be  used  to  construct 
the  velocity  field.  If  we  know  the  true  direction  of  velocity  for  two  points  on 
the  contour,  we  can  compute  the  direction  of  velocity  everywhere  as  follows:  (1) 
construct  the  lines  perpendicular  to  the  direction  of  velocity  at  the  two  known 
points,  (2)  compute  the  intersection  of  these  two  lines,  (3)  from  every  point  p, 
along  the  contour,  construct  the  line  to  the  intersection  point;  the  true  direction 
of  velocity  is  perpendicular  to  this  line.  In  Figure  10,  we  derive  the  direction  at  p2, 
given  known  directions  at  pj  and  P3. 


Figure  10.  Construction  of  the  velocity  field  for  rigid  motion  in  the  image  plane 

This  construction  is  simply  locating  the  point  about  which  the  motion  can 
be  described  as  pure  rotation.  For  pure  translation,  the  two  lines,  from  points  of 
known  direction  of  velocity,  will  be  parallel,  so  the  direction  of  motion  everywhere 
will  be  equal  to  the  direction  of  motion  of  the  known  points.  Certainly,  if  we  knew 
both  the  direction  and  magnitude  of  velocity  at  two  points  along  the  contour,  we 
could  compute  the  global  motion  parameters,  and  hence  direction  and  magnitude 
of  velocity  everywhere.  However,  from  the  direction  of  velocity  alone  at  two  points 
on  the  curve,  we  can  compute  the  direction  of  velocity  everywhere.  If  we  then  know 
the  magnitude  of  the  perpendicular  components  of  velocity  along  the  curve,  we  can 
compute  both  direction  and  magnitude  of  velocity  along  the  curve. 

There  are  at  least  two  sources  for  points  of  known  velocity  direction  in  the 
image.  First,  identifiable  features,  such  as  terminations,  may  be  tracked  in  two 
dimensions.  Second,  for  points  at  which  the  perpendicular  component  of  velocity  is 
zero,  velocity  is  constrained  to  lie  along  the  tangent  to  the  curve.  For  the  case  of 
a  smooth,  rigid,  closed  curve  moving  in  the  image  plane,  there  must  exist  at  least 
two  points  on  the  curve  for  which  the  perpendicular  component  of  velocity  is  zero. 
Suppose  we  focused  our  velocity  field  computation  at  the  locations  of  zero- crossings 
derived  from  the  image.  Since  zero-crossing  contours  are  generally  closed  (except 
at  image  boundaries),  there  will  usually  be  sufficient  constraint  from  the  image  to 
solve  for  the  velocity  field,  in  the  simple  case  of  rigid  motion  in  the  image  plane. 

4.3.  The  Smoothness  Constraint 

In  this  section  we  will  derive  a  different  type  of  constraint  on  the  velocity  field, 
which  will  allow  us  to  analyze  the  projected  motion  of  three-dimensional  objects 
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allowed  to  move  freely  in  space,  and  deform  over  time.  The  specific  analysis  will 
assume  that  we  have  measured  the  perpendicular  components  of  velocity  along 
contours  in  the  image.  However,  the  general  constraint  that  we  present  may  be 
utilized  in  other  motion  measurement  schemes  as  well. 

The  expression  (2)  related  V(s)  to  the  global  motion  parameters  w  and  n,  and 
the  shape  of  the  curve  C  =  (z(s),  y(s),  z(s)).  The  relationship  between  V(s)  and  C 
is  quite  simple.  If  we  map  the  projected  two-dimensional  velocity  vectors  along  the 
curve  to  a  common  origin  in  velocity  space,  their  endpoints  map  out  a  scaled,  90° 
rotation  of  the  projected  image  curve  (z(s),  y(s)),  with  an  additional  distortion  along 

one  direction.  This  distortion  is  directed  perpendicular  to  the  axis  n  =  [ni,n2,0}r , 
and  is  scaled  by  the  z  component  of  the  curve,  z(s).  This  relationship  implies,  for 
example,  that  if  we  have  a  smooth  curve  in  motion,  it  must  generate  a  smoothly 
varying  velocity  field.  The  real  world  consists  predominantly  of  solid  objects,  whose 
surfaces  are  generally  smooth  compared  with  their  distance  from  the  viewer.  Thus, 
intuitively,  we  seek  a  velocity  field  which  is  consistent  with  the  constraints  we  derive 
from  the  changing  image,  and  which  varies  smoothly.  A  single  solution  might  be 
obtained  by  finding  the  velocity  field  which  varies  as  little  as  possible.  A  similar 
argument  was  used  by  Horn  and  Schunck  [35]  to  motivate  the  use  of  a  smoothness 
constraint  for  the  optical  flow  computation.  In  our  case,  we  seek  a  smooth  velocity 
field  along  a  contour. 

To  achieve  this,  we  need  some  means  of  measuring  the  variation  in  velocity 
along  a  contour.  There  are  various  ways  in  which  this  could  be  done.  For  example, 
we  could  measure  the  change  in  direction  of  velocity  as  we  trace  along  the  contour. 
Total  variation  of  the  velocity  field  could  then  be  defined  as  the  total  change  in 
direction  over  the  entire  contour.  A  second  definition  involves  measuring  the  change 
in  magnitude  of  velocity  along  the  contour.  This  leads  to  a  velocity  field  solution  for 
which  speed  is  as  uniform  as  possible  along  the  contour.  Finally,  we  could  measure 

the  change  in  the  full  velocity  vector,  incorporating  both  the  direction  and 

magnitude  of  velocity. 

In  order  to  define  the  variation  of  the  velocity  field  more  formally,  first  recall 
the  decomposition  of  velocity  into  components  tangent  and  perpendicular  to  the 
curve;  -r  -r  , 

V(s)  =  u^(s)u^(s)  tA(s)u-L(s)  (1) 

UT(S),  u-L(s)  and  v-^-(s)  can  be  measured  directly  from  the  changing  image.  v^(s) 
is  unknown,  and  must  be  recovered  in  order  to  compute  the  velocity  field  V(s). 
Aside  from  knowing  v-L(s)  everywhere  along  the  curve,  there  may  also  be  points  at 
which  the  direction  and  magnitude  of  velocity,  and  hence  both  t>-L(s)  and  vT(s), 

are  known.  In  addition,  the  direction  of  velocity  alone,  and  hence  the  ratio 
may  be  known  at  points  on  the  curve,  for  example,  where  v-L(s^)  =  0  (Section  4.2). 

We  can  now  consider  a  more  formal  means  for  measuring  the  variation  in  the 
velocity  field.  Mathematically,  this  can  be  accomplished  by  defining  a  functional 
0,  which  maps  the  space  of  all  possible  vector  fields  (along  the  contour),  V,  into 
the  real  numbers:  ©:V  *->■  SR.  This  functional  should  be  such  that  the  smaller  the 
variation  in  the  velocity  field,  the  smaller  the  real  number  assigned  to  it.  Two 
candidate  velocity  fields  may  then  be  compared,  by  comparing  their  corresponding 
real  numbers.  This  raises  the  question  of  what  Functional  should  be  used  to  measure 
the  variation  of  a  velocity  field.  In  the  remainder  of  this  section,  we  will  evaluate 
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a  set  of  possible  functionals,  based  on  the  three  measures  of  variation  that  we 
previously  presented  informally:  (1)  variation  in  V(s),  (2)  variation  in  the  direction 
of  velocity,  and  (3)  variation  in  the  magnitude  of  velocity,  all  with  respect  to  the 
curve. 

(1)  Variation  in  V(s) 

A  scalar  measure  of  the  local  variation  of  V(s)  with  respect  to  the  curve  is  given  by 
1 2^1,  shown  in  Figure  11a.  Two  nearby  velocity  vectors  along  the  image  curve  are 

translated  to  a  common  origin  in  velocity  space,  where  the  vector  js  shown 

with  a  dotted  arrow.  For  convenience  of  notation,  we  will  omit  the  argument  to 


Figure  11.  Measuring  variation  in  the  velocity  field  (a)  Change  in  the  full  velocity 
vector  (b)  Change  in  direction  of  velocity 

V(s),  writing  iff).  A  measure  of  the  total  variation  of  the  velocity  field  along  the 
curve  may  then  be  given  by  the  functional: 

e(y)=/i^i* 
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We  may  also  consider  variations  on  this  functional,  involving  higher  order  derivatives, 
or  higher  powers,  such  as: 


6(V)  =  / 1 


or 


(2)  Variation  in  Direction 

Let  the  direction  of  velocity  be  given  by  the  angle  < p,  measured  in  the  clockwise 
direction  from  the  horizontal,  as  shown  in  Figure  11a.  In  Figure  lib,  for  two 
nearby  velocity  vectors  along  the  image  curve,  is  shown  in  velocity  space.  Total 
variation  of  direction  along  the  curve  could  be  given  by  functionals  such  as  the 
following: 

e<v>=/ 1£> 

or  variations  involving  higher  order  derivatives,  or  higher  powers. 

(3)  Variation  in  Magnitude 

Finally,  we  could  measure  the  change  in  magnitude  of  velocity  alone,  using 
functionals  such  as: 

Again,  we  could  also  consider  variations  on  this  measure. 

The  functional  that  we  use  to  measure  smoothness  may  also  incorporate  a 
measure  of  the  velocity  field  itself,  rather  than  strictly  utilizing  changes  in  the 
velocity  field  along  the  curve.  For  example,  we  could  incorporate  a  term  which  is 
a  function  of  |V|.  This  might  be  useful  if  we  sought  a  velocity  field  which  also 
exhibits  the  least  total  motion.  In  addition,  the  functional  could  become  arbitrarily 

complex  in  its  combination  of  l^7l>  |j  or  higher  order  derivatives. 

We  have  at  least  three  means  of  evaluating  these  measures  of  smoothness. 
From  a  mathematical  point  of  view,  there  should  exist  a  unique  velocity  field 
which  minimizes  our  particular  measure  of  smoothness;  this  requirement  imposes 
a  set  of  mathematical  constraints  on  our  functional.  Second,  the  velocity  field 
computation  should  yield  physically  plausible  solutions.  Finally,  if  we  suggest  that 
such  a  smoothness  constraint  underlies  the  motion  computation  in  the  human 
visual  system,  this  minimization  should  yield  a  velocity  field  consistent  with  human 
motion  perception. 

An  examination  of  these  smoothness  measures  from  a  physical  and  mathematical 
point  of  view  suggests  that  a  measure  involving  the  full  velocity  vector,  such  as 

0(V)  =  /  l^-pda,  is  most  appropriate  for  the  velocity  field  computation  [37].  Of 
particular  importance  are  the  mathematical  properties  of  this  functional.  It  can 
be  shown  that,  given  a  simple  condition  on  the  constraints  that  we  derive  from 
the  image,  there  exists  a  unique  velocity  field  which  satisfies  our  constraints,  and 

minimizes  /  |4^|2ds.  This  condition  is  almost  always  satisfied  by  our  initial  motion 
measurements.  To  obtain  this  result,  we  take  advantage  of  the  analysis  used  by 
Crimson  [70]  for  evaluating  possible  functionals  for  performing  surface  interpolation 
from  stereo  data.  The  basic  mathematical  question  is,  what  conditions  on  the  form 
of  the  functional,  and  the  structure  of  the  space  of  velocity  fields,  are  needed  to 
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guarantee  the  existence  of  a  unique  solution?  These  conditions  are  captured  by  the 
following  theorem  {see  also  [71]): 

Theorem:  Suppose  there  exists  a  complete  semi-norm  &  on  a  space  of  functions 
H ,  and  that  0  satisfies  the  parallelogram  law.  Then,  every  nonempty  closed 
convex  set  E  C.  H  contains  a  unique  element  v  of  minimal  norm,  up  to  an 
element  of  the  null  space.  Thus,  the  family  of  minimal  functions  is 

{v  +  s  \  s  £  S} 


where 

S  =  {v~w\w£E}C\fl 
and  M  is  the  null  space  of  the  functional 


U  —  {u  |  0(u)  =  0}. 


It  can  be  shown  that  the  functional  {/  is  a  complete  semi-norm,  which 

satisfies  the  parallelogram  law.  Second,  the  space  of  all  possible  velocity  fields, 
which  satisfy  the  constraints  derived  from  the  image,  is  convex.  It  then  follows 
from  the  above  theorem  that  this  space  contains  a  unique  element  of  minimal  norm, 
up  to  an  element  of  the  null  space.  Since  our  smoothness  measure  is  non-negative, 

minimizing  {/iffi8*}1  is  equivalent  to  minimizing  /  |^|2ds. 

The  null  space  in  this  case  is  the  set  of  constant  velocity  fields,  since 
/|^jj2ds  =  0  implies  )^|  =  0  everywhere,  which  implies  V(s)  constant.  Suppose 

we  have  a  point  (x(si),  y[sx))  on  the  curve,  where  u-^(si)  is  known.  This  measurement 
constrains  the  velocity  V(s,)  to  lie  along  a  line  parallel  to  the  tangent  of  the  curve 
at  this  point,  as  shown  in  Figure  12.  Suppose  we  have  a  velocity  field  which  is 


Figure  13.  Uniqueness  of  the  velocity  field,  (a)  Constraint  provided  by  a  single 
measurement  (b)  The  constraint  imposed  by  two  measurements 

consistent  with  this  measure.  We  can  now  only  add  a  uniform  translation  component 
along  the  direction  of  this  line,  and  still  obtain  a  velocity  field  consistent  with 
this  local  measure.  If  v-^(s)  is  known  at  a  second  point  (x(st),  t/(s,)),  for  which  the 
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direction  of  the  tangent  is  different  (see  Figure  12b),  then  we  can  only  add  a  uniform 
translation  component  along  this  second  direction,  and  still  obtain  a  velocity  field 
consistent  with  tA(sy).  However,  we  cannot  add  a  uniform  translation  to  the  entire 
velocity  field,  which  is  consistent  with  both  local  measurements.  Thus,  we  conclude 
the  following:  If  v-l-(s)  is  known  at  two  points,  for  which  the  orientation  of  the 
curve  is  different,  then  there  exists  a  unique  velocity  field  which  satisfies  the  known 

velocity  constraints  and  minimizes  An  extended  straight  line  will  not 

yield  measurements  for  two  different  orientations,  but  in  all  other  cases,  there  will 
be  sufficient  nformation  along  a  contour  to  guarantee  a  unique  solution  to  the 
velocity  field. 

We  can  apply  the  constraint  of  least  variation  and  compute  a  projected 
two-dimensional  velocity  field  for  any  three-dimensional  surface,  whether  rigid  or 
non-rigid,  undergoing  general  motion  in  space.  If  we  measure  the  variation  in 
the  full  velocity  vector  along  a  contour  in  the  image,  using  a  functional  such  as 

/|^~|2ds,  we  are  guaranteed  that  there  exists  a  unique  solution  to  the  velocity 
field  computation  that  minimizes  this  variation.  While  it  is  not  yet  clear  that  the 
general  smoothness  constraint,  or  the  particular  measure  /|^|2ds,  is  the  most 
appropriate  for  the  motion  computation,  it  is  important  that  this  measure  satisfies 
certain  essential  mathematical  requirements,  that  the  other  measures  do  not.  For 
example,  the  use  of  a  functional  incorporating  only  a  measure  of  velocity  direction, 
which  will  attempt  to  make  the  local  velocity  vectors  as  parallel  as  possible,  does 
not  yield  functionals  which  are  semi-norms,  and  consequently,  does  not  lead  to  a 
unique  velocity  field  solution.  For  a  scheme  to  underly  the  motion  computation  in 
the  human  visual  system,  it  is  essential  that  it  be  mathematically  well-founded. 

We  should  note  that  an  advantage  to  applying  the  smoothness  constraint  along 
contours  is  that  the  minimization  of  variation  in  the  velocity  field  is  performed 
along  one-dimension,  rather  than  over  two  dimensions,  as  in  the  case  of  Horn  and 
Schunck’s  computation  of  the  optical  flow  [35].  Secondly,  to  apply  the  smoothness 
constraint  over  an  area  of  the  image,  it  is  necessary  to  specify  a  neighborhood 
size,  within  which  constraints  will  be  combined,  and  smoothness  imposed  on  the 
velocity  field.  Unless  we  can  define  surface  boundaries  prior  to  the  velocity  field 
computation,  specifying  an  appropriate  area  of  the  image  can  be  difficult.  In 
general,  the  extent  of  contours  is  more  highly  correlated  with  single  surfaces.  The 
smoothness  constraint  can  be  applied  to  single  contours,  reducing  the  problem 
of  integrating  motion  measurements  across  object  boundaries.  Finally,  there  exist 
several  standard  algorithms  for  the  solution  of  optimization  problems  such  as  this 
(see,  e.g.  [37]). 

4.4.  Deriving  Additional  Constraints  from  the  Image 

In  the  previous  section,  we  used  two  sources  of  constraint  on  the  velocity  field 
computation.  From  the  image,  we  utilized  a  single  curve  at  a  particular  moment  in 
time,  together  with  the  instantaneous  measurements  of  the  perpendicular  component 
of  velocity  along  the  curve.  As  a  second  source  of  constraint,  we  computed  the 
velocity  field  consisent  with  these  image  constraints,  which  exhibited  the  least 
variation  along  the  curve.  Additional  constraints  can  be  derived  from  the  image  if 
we  do  not  restrict  ourselves  to  the  use  of  instantaneous  measurements;  for  example, 
we  may  utilize  a  second  curve,  at  some  time  later.  If  the  time  interval  is  small, 
then  the  displacement  of  points  along  the  curve  will  also  be  small.  We  can  then 
require  that  each  point  on  the  first  curve  project  to  a  point  on  the  second,  with 
a  velocity  consistent  with  the  instantaneous  perpendicular  component  of  velocity 
iA(s);  this  assumes  that  velocity  is  constant  over  the  time  interval  separating  the 
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two  curves.  In  addition,  we  could  still  compute  the  velocity  field  which  exhibits  the 
least  variation. 

This  approach  may  yield  a  simpler,  more  robust  algorithm  for  the  velocity 
field  computation,  because  it  utilizes  more  constraint  from  the  image.  However,  it 
has  the  disadvantage  that  we  may  not  be  able  to  obtain  the  theoretical  uniqueness 
results  that  were  possible  when  we  considered  the  perpendicular  components  of 
velocity  as  a  sole  source  of  constraint.  A  simple  example,  in  which  the  velocity 
field  solution  is  not  unique  is  shown  in  Figure  13.  Suppose  we  are  given  the  initial 
constraints  shown  in  Figure  I3a.  The  arrows  indicate  the  perpendicular  components 
of  velocity  along  the  first  curve,  and  the  dotted  line  indicates  the  second  curve. 
There  are  two  velocity  field  solutions  consistent  with  these  constraints,  shown  in 
Figures  13b  and  c,  corresponding  to  the  two  directions  of  rotation  of  the  circle. 
Both  velocity  fields  exhibit  the  same  total  variation.  In  general,  theoretical  results 
on  uniqueness  may  be  more  difficult  to  obtain  for  this  approach  to  the  velocity  field 
computation.  The  use  of  instantaneous  motion  measurements  alone,  together  with 
the  additional  smoothness  constraint,  as  discussed  in  Section  4.3,  would  yield  the 
velocity  field  given  by  the  vectors  in  Figure  13a,  corresponding  to  pure  expansion 
of  the  circle.  The  additional  constraint  of  the  second  curve  leads  to  a  different 
solution. 


(a) 


(b) 


Figure  13.  Ambiguity  of  the  velocity  field,  (a)  The  initial  constraints  (b)  Rotation  of 

the  circle  to  the  left  (c)  Rotation  to  the  right 

The  availability  of  the  second  curve  may  simplify  the  velocity  field  computation 
in  the  following  way.  The  perpendicular  component  of  velocity,  measured  at  a 
point  p  on  the  first  curve,  constrains  the  velocity  vector  at  p  to  project  to  a  point 
along  the  line  l  in  the  second  frame,  shown  in  Figure  14.  If  in  addition,  p  must 
project  to  a  point  q  on  the  second  curve,  possible  candidates  for  q  may  be  given  by 
the  intersection  of  l  with  the  second  curve.  In  practice,  there  will  be  error  in  the 
measurement  of  v-L(s),  and  may  not  be  constant  over  small  time  intervals. 

As  a  consequence,  we  should  consider  a  band  in  the  second  frame,  to  which  p  must 
project.  Candidates  for  q  are  then  given  by  the  intersection  of  this  band  with  the 
second  curve,  shown  in  Figure  14.  If  the  curve  has  fairly  high  local  curvature,  or 
undergoes  rotation,  then  this  intersection  alone  provides  considerable  constraint  on 
the  velocity  field.  However,  in  the  worst  case  of  an  extended  line  undergoing  pure 
translation,  the  second  curve  offers  limited  additional  constraint.  The  computation 
of  a  precise  velocity  field  requires  further  analysis  of  constraints  derived  from  the 
image,  together  with  additional  assumptions.  We  are  presently  exploring  algorithms 
which  utilize  the  smoothness  constraint  for  this  subsequent  computation. 
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Figure  14.  Use  of  the  constraint  provided  by  a  second  curve 

4.5.  Summary 

We  have  considered  various  additional  constraints  which  may  be  used  in  the 
computation  of  the  velocity  field  from  initial  motion  measurements  derived  from 
the  changing  image.  These  constraints  range  from  the  restricted  assumption  of 
pure  translation  to  the  general  constraint  of  smoothness  of  the  velocity  field,  which 
allows  for  the  arbitrary  movement  of  rigid  or  non-rigid  objects  in  space.  The  use  of 
different  constraints  results  in  considerable  variation  in  the  classes  of  motion  which 
may  be  analyzed,  the  type  of  algorithm,  and  the  extent  of  theoretical  analysis 
required  to  formulate  a  well-defined  computational  problem.  In  analyzing  these 
constraints,  we  have  so  far  restricted  ourselves  to  addressing  purely  computational 
issues.  In  the  next  section,  we  discuss  implications  for  the  biological  computation  of 
motion.  If  the  human  visual  system  does  in  fact  compute  a  detailed  velocity  field, 
it  is  likely  to  use  as  much  constraint  as  possible  from  the  changing  image,  together 
with  the  least  restrictive  additional  constraints  as  necessary,  to  compute  a  unique 
velocity  field. 


5.  Some  Implications  Concerning  the  Biological  Computation  of  Motion 

In  this  section,  we  summarize  the  above  discussion  by  presenting  a  list  of  the  basic 
proposals  that  have  been  made  for  the  computation  of  motion.  In  addition,  we 
discuss  some  of  the  implications  of  these  proposals  for  the  human  visual  system. 

(i)  An  underlying  assumption  of  this  work  is  that  the  local  velocity  field  is  explicitly 
computed  and  represented.  For  the  human  visual  system,  the  idea  that  there  exists 
an  explicit  computation  of  motion,  which  is  different  from  the  description  of  motion 
that  could  be  provided  by  initial  motion  detectors,  can  be  motivated  by  simple 
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examples.  In  Figure  15,  we  show  a  circle  and  square  undergoing  pure  translation. 
Initial  motion  measurements  provide  the  component  of  motion  in  the  direction 
perpendicular  to  the  local  orientation  of  intensity  changes  in  the  image,  shown 
in  Figure  15a.  Our  perception  of  the  movement  of  the  figures  is  pure  translation, 
indicated  by  the  set  of  velocity  vectors  in  Figure  15b.  A  third  example  is  that  of 
the  rotating  and  translating  curve  of  Figure  9.  While  it  is  not  clear  whether  we 
are  capable  of  explicitly  representing  the  local  velocity  field  around  the  contour, 
we  do  perceive  the  movement  as  the  rotation  and  translation  of  a  rigid  curve.  Such 
an  interpretation  is  not  explicit  in  the  initial  motion  measurements.  For  tasks  such 
as  the  detection  of  a  sudden  movement,  or  separation  of  objects  on  the  basis  of 
differential  motion,  a  precise  local  velocity  field  may  not  be  necessary.  However, 
to  compute  three-dimensional  structure  from  motion,  a  more  detailed  computation 
of  the  velocity  field,  or  an  explicit  correspondence  of  elements  between  frames,  is 
required. 


(a) 


(b) 

Figure  15.  Computing  the  local  velocity  field,  (a)  The  initial  motion  measurement*; 

(b)  The  velocity  field  corresponding  to  translation 

(ii)  The  analysis  of  motion  has  been  separated  into  two  distinct  stages;  first,  the 
measurement  of  motion,  and  second,  the  use  of  motion  for  tasks  such  as  object 
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segmentation  and  structure  from  motion.  This  raises  the  question  of  whether  the 
interpretation  of  three-dimensional  structure  can  influence  the  computation  of 
the  two-dimensional  velocity  field.  For  example,  does  the  assumption  of  rigidity, 
examined  in  Ullman’s  work  [13],  enter  into  the  velocity  field  computation? 
Psychological  experiments  [13]  suggest  that  the  long  range  motion  correspondence 
is  not  influenced  by  the  intrepreted  three-dimensional  structure  of  a  single  view. 
The  short  range  process  may  be  similar. 

(iii)  We  support  the  idea  that  there  exists  two  processes  for  analyzing  motion, 
corresponding  to  Braddick’s  long  range  and  short  range  processes.  We  suggest  that 
the  long  range  process  is  based  on  a  token-matching  scheme,  while  the  short  range 
process  is  intensity-based.  If  this  view  is  valid,  it  raises  the  following  questions.  How 
do  the  long  range  and  short  range  processes  interact?  Do  subsequent  computational 
tasks,  such  as  object  segmentation  or  structure  from  motion  utilize  the  results  of 
one  or  the  other  process?  The  work  of  Petersik  [72]  suggests  that  the  long  range 
process  may  be  crucial  to  the  recovery  of  structure  from  motion.  Finally,  it  is 
interesting  that  neurophysiological  studies  have  revealed  many  units  which  are 
responsive  to,  or  selective  for  stimuli  undergoing  continuous  motion.  Little  is  known 
about  the  long  range  process  at  the  neurophysiological  level.  One  obvious  question 
is,  where  in  the  visual  system  can  apparent  motion  phenomena  be  observed  in  the 
response  of  single  units?  Motion  sensitive  units  (for  example  in  areas  Vl  and  STS 
or  MT  of  the  monkey)  could  be  tested  for  apparent  motion  response  by  flashing 
bars  at  stationary  locations  using  relatively  wide  separations  (that  is,  wider  than 
the  largest  size  of  simple  cells  at  the  tested  eccentricity).  If  long  range  motion 
units  can  be  identified,  it  may  become  possible  to  go  a  step  further  and  investigate 
the  relationship  between  the  psychophysically  established  correspondence  rules  and 
their  neurophysiological  correlates. 

(iv)  We  suggest  that  the  initial  stage  of  motion  analysis  consists  of  the  measurement 
of  the  perpendicular  component  of  velocity  along  zero-crossing  contours.  This 
can  be  examined  through  neurophysiological  and  psychological  studies.  In  regard 
to  neurophysiology,  are  the  class  of  directionally-selective  simple  cells  detecting 
the  motion  of  zero-crossings  in  their  input  from  the  LGN?  This  is  now  under 
investigation  [Richter,  personal  communication];  initial  results  tend  to  support  this 
claim.  Psychophysical  experiments  can  test  whether  perceived  motion  is  consistent 
with  the  motion  of  zero-crossings. 

(v)  We  propose  that  the  local  motion  measurements  are  then  integrated  along  zero- 
crossing  contours.  Again,  this  may  be  explored  through  both  neurophysiology  and 
psychology.  If  the  motion  integration  problem  is  fundamental  to  motion  analysis, 
one  may  expect  to  find  neural  mechanisms  within  the  visual  system  that  are  involved 
in  this  task.  Most  of  the  motion  sensitive  units  studied  so  far  do  not  seem  suitable 
for  the  integration  stage.  Motion  selective  cells  in  the  primary  visual  cortex  of  the 
cat  and  the  monkey  respond  primarily  to  edges  and  bars.  To  activate  such  a  unit  the 
stimulus  must  have  the  preferred  orientation,  and  move  in  the  preferred  direction. 
In  contrast,  promising  candidates  for  the  integration  phase  would  dissociate  the 
effects  of  orientation  and  direction  of  movement,  ideally  exhibiting  specificity  for 
direction  of  motion  but  not  for  orientation.  Furthermore,  the  direction  specificity 
of  such  a  ur  ;t  is  expected  to  depend  on  the  range  of  orientations  spanned  by 
the  stimulus.  There  are  indications  for  the  possible  existence  of  such  units  in 
the  posterior  bank  of  the  superior  temporal  sulcus  of  the  rhesus  monkey  [73].  For 
psychophysical  experimentation,  there  are  at  least  two  questions;  first,  is  the  motion 
that  perceive  forced  to  be  consistent  at  least  with  the  sign  of  the  local  motion 
measurements  along  zero-crossing  contours,  or  can  it  be  overridden,  for  example,  by 
the  long  range  process,  or  by  the  history  of  the  motion?  Second,  if  the  integration 
does  take  place,  are  measurements  combined  over  neighborhoods  in  the  image,  or 
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along  contours?  Wallach’s  [74]  demonstrations  suggest  that  the  integration  may 
take  place  along  contours. 

(vi)  Additional  assumptions  are  required  for  the  motion  integration  problem. 
Regarding  the  human  system,  we  may  first  ask  what  constraints  are  derived  from 
the  changing  image.  Does  the  human  visual  system  strictly  utilize  instantaneous 
measurements  of  velocity,  or  is  a  second  curve,  at  some  small  time  interval  later, 
also  used  to  constrain  the  velocity  field.  Do  we  utilize  an  additional  constraint  on 
smoothness  of  the  velocity  field,  as  described  here?  A  constraint  such  as  smoothness 
may  be  the  least  restrictive  constraint  which  allows  objects  to  move  freely  in  space, 
and  deform,  but  which  still  allows  for  the  computation  of  a  unique  velocity  field. 
Psychophysical  experimentation  is  necessary  to  determine  whether  the  velocity  field 
that  we  perceive  is  the  smoothest  one  possible.  Both  the  short  and  long  range 
processes  face  the  fact  that  in  general,  the  motion  of  elements  is  not  specified 
uniquely  by  information  in  the  changing  image;  do  the  additional  assumptions 
governing  the  computation  of  velocity  or  correspondence  differ  in  the  two  processes, 
or  do  they  differ  only  in  the  constraints  that  are  utilized  from  the  changing  image? 

(vii)  Finally,  the  motion  measurement  problem  has  some  implications  for  the 
interpretation  of  structure  from  motion.  It  has  been  shown  [17]  that  three- 
dimensional  shape  can  be  recovered  locally,  from  the  instantaneous  velocity  field. 
The  interpretation  is  sensitive,  however,  to  small  errors  in  the  measured  velocity. 
In  light  of  the  inherent  difficulties  in  measuring  the  velocity  field  precisely,  recovery 
methods  that  rely  solely  on  the  instantaneous  velocity  field  appear  unlikely.  For 
the  reliable  recovery  of  three-dimensional  structure  from  motion,  processes  that 
integrate  motion  over  time  are  probably  required. 
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